-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fix: Replace hardcoded token limits with model-specific configuration #1376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
fix: Replace hardcoded token limits with model-specific configuration #1376
Conversation
Problem: - Spec creation failing with "max_tokens: 65537 > 64000" error for Opus 4.5 - Magic numbers scattered across codebase (62000, 63999, 64000) - No validation against model-specific limits - SDK bug #8756 causes intermittent validation errors when max_tokens is reduced without adjusting thinking budget Solution: - Created model_limits.json configuration file with all Claude 4.5 model limits - All models have 64K max_output_tokens (Opus, Sonnet, Haiku 4.5) - Set ultrathink budget to 60K (leaves 4K buffer for SDK overhead) - Added validation functions to cap thinking budgets to model limits - Updated frontend constants to match backend configuration - Added comprehensive tests for model-specific validation Changes: - apps/backend/model_limits.json: New configuration file with model limits - apps/backend/phase_config.py: Load limits from config, add validation - apps/frontend/src/shared/constants/models.ts: Add model limit constants - tests/test_model_limits.py: New tests for model-specific validation (10 tests) - tests/test_thinking_level_validation.py: Update ultrathink budget to 60K Technical Details: - API constraint: max_tokens > thinking.budget_tokens (strictly greater) - All Claude 4.5 models: 64K max output, 200K context window - Safe thinking budget: 60K tokens (4K buffer for SDK overhead) - Graceful degradation: Warns and caps excessive budgets instead of failing Testing: - All 19 tests pass (9 existing + 10 new) - Validates budget capping, API constraints, and buffer requirements Fixes SDK bug workaround for issue anthropics/claude-code#8756 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com> Signed-off-by: Sebastian Geraldes <199673787+sebastiangeraldes@users.noreply.github.com>
|
|
Summary of ChangesHello @sgeraldes, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses critical issues related to hardcoded token limits for Claude models, which were causing Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
📝 WalkthroughWalkthroughA new centralized model limits configuration ( Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested labels
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 Thanks for your first PR!
A maintainer will review it soon. Please make sure:
- Your branch is synced with
develop - CI checks pass
- You've followed our contribution guide
Welcome to the Auto Claude community!
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is an excellent improvement that addresses the problem of hardcoded token limits by introducing a model-specific configuration system. The new model_limits.json file centralizes all model constraints, and the backend code in phase_config.py is cleanly refactored to load, validate, and gracefully handle these limits. I particularly appreciate the addition of comprehensive tests in tests/test_model_limits.py, which ensure the new logic is robust and correct. My only suggestion for improvement relates to the duplication of model limits on the frontend, which could be addressed to further improve maintainability.
| // Model-specific output token limits (all Claude 4.5 models have 64K max_tokens) | ||
| export const MODEL_OUTPUT_LIMITS: Record<string, number> = { | ||
| 'claude-opus-4-5-20251101': 64000, | ||
| 'claude-sonnet-4-5-20250929': 64000, | ||
| 'claude-haiku-4-5-20251001': 64000, | ||
| opus: 64000, | ||
| sonnet: 64000, | ||
| haiku: 64000 | ||
| } as const; | ||
|
|
||
| // Maximum safe thinking budget for each model (leaves buffer for SDK overhead) | ||
| export const MODEL_MAX_THINKING: Record<string, number> = { | ||
| 'claude-opus-4-5-20251101': 60000, | ||
| 'claude-sonnet-4-5-20250929': 60000, | ||
| 'claude-haiku-4-5-20251001': 60000, | ||
| opus: 60000, | ||
| sonnet: 60000, | ||
| haiku: 60000 | ||
| } as const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While centralizing the model limits on the backend is a great step, these new constants (MODEL_OUTPUT_LIMITS and MODEL_MAX_THINKING) duplicate the configuration from apps/backend/model_limits.json. This creates two sources of truth and could lead to inconsistencies if limits are updated in one place but not the other.
To improve maintainability and ensure consistency, consider creating a backend API endpoint that exposes these model limits. The frontend could then fetch this configuration when the application loads. This would make model_limits.json the single source of truth for the entire application.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
apps/backend/phase_config.py (1)
1-17: Ruff format check is failing for apps/backend.CI reports one file needs formatting. Please run
ruff format apps/backend/ --quietand commit the changes.tests/test_thinking_level_validation.py (1)
13-13: Fix import path casing to use lowercase "apps".Line 13 uses
"Apps"but the actual directory is lowercase"apps". On case-sensitive filesystems this will fail. Please align with the repository structure and other tests liketest_model_limits.py.Proposed fix
- sys.path.insert(0, str(Path(__file__).parent.parent / "Apps" / "backend")) + sys.path.insert(0, str(Path(__file__).parent.parent / "apps" / "backend"))
🤖 Fix all issues with AI agents
In `@apps/backend/phase_config.py`:
- Around line 25-31: The _load_model_limits function currently builds the file
path via Path(__file__).parent / "model_limits.json" (limits_file); update it to
use the project’s platform abstraction path API instead: import the platform
abstraction module and replace the limits_file construction with the platform’s
path/resource helper (e.g., platform.join_path or platform.get_resource_path) so
path handling follows backend guidelines while keeping the same open(...,
encoding="utf-8") call and return json.load(f).
| # Load model limits from configuration file | ||
| def _load_model_limits() -> dict: | ||
| """Load model limits from model_limits.json.""" | ||
| limits_file = Path(__file__).parent / "model_limits.json" | ||
| try: | ||
| with open(limits_file, encoding="utf-8") as f: | ||
| return json.load(f) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Use the platform abstraction module for path handling.
Line 28 constructs paths directly via Path. Per backend guidelines, route path handling through the project’s platform abstraction module.
🤖 Prompt for AI Agents
In `@apps/backend/phase_config.py` around lines 25 - 31, The _load_model_limits
function currently builds the file path via Path(__file__).parent /
"model_limits.json" (limits_file); update it to use the project’s platform
abstraction path API instead: import the platform abstraction module and replace
the limits_file construction with the platform’s path/resource helper (e.g.,
platform.join_path or platform.get_resource_path) so path handling follows
backend guidelines while keeping the same open(..., encoding="utf-8") call and
return json.load(f).
AndyMik90
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤖 Auto Claude PR Review
Merge Verdict: 🔴 BLOCKED
🔴 Blocked - 2 CI check(s) failing. Fix CI before merge.
Blocked: 2 CI check(s) failing. Fix CI before merge.
Risk Assessment
| Factor | Level | Notes |
|---|---|---|
| Complexity | Medium | Based on lines changed |
| Security Impact | None | Based on security findings |
| Scope Coherence | Good | Based on structural review |
🚨 Blocking Issues (Must Fix)
- CI Failed: Lint Complete
- CI Failed: Python (Ruff)
Findings Summary
- Low: 3 issue(s)
Generated by Auto Claude PR Review
Findings (3 selected of 3 total)
🔵 [078e01afefcd] [LOW] [Potential] Frontend/backend configuration duplication requires manual sync
📁 apps/frontend/src/shared/constants/models.ts:26
Token limits (64000, 60000) and thinking level budgets are defined in both apps/backend/model_limits.json (source of truth) and apps/frontend/src/shared/constants/models.ts (lines 27-53). While a comment on line 26 documents the sync requirement, there's no automated validation. This is an acceptable trade-off given the complexity of sharing config between Python/TypeScript, but creates future maintenance burden.
Suggested fix:
Consider adding a CI test that compares frontend constants against backend JSON to catch drift. Alternatively, document the sync requirement more prominently in both files.
🔵 [d6f9e65c82a1] [LOW] [Potential] Cross-reference comment uses outdated path
📁 apps/backend/phase_config.py:54
Comment references 'auto-claude-ui/src/shared/constants/models.ts' but the actual frontend path is 'apps/frontend/src/shared/constants/models.ts'. This appears to be a legacy project name.
Suggested fix:
Update comment from 'auto-claude-ui/' to 'apps/frontend/'
🔵 [ea8f8d448fce] [LOW] [Potential] Missing tests for JSON file loading failure scenarios
📁 tests/test_model_limits.py:1
The _load_model_limits() function in phase_config.py handles FileNotFoundError and JSONDecodeError with fallback defaults (lines 32-48), but the test file has no coverage for these error paths. If fallback values are accidentally modified, no test would catch the regression.
Suggested fix:
Add tests that mock file loading failures: (1) Use unittest.mock.patch('builtins.open') to simulate FileNotFoundError, (2) Verify fallback dict is returned, (3) Verify warning is logged.
This review was generated by Auto Claude.
|
Comment is not valid. Too much code for a quick patch. Already added tests for a set of hardcoded magic numbers that got added as configurations instead. Simplify and implement quickly. |
|
I have read the CLA Document and I hereby sign the CLA |
67a743f to
e83e445
Compare
Problem
Spec creation was failing with
max_tokens: 65537 > 64000error for Claude Opus 4.5. The codebase had magic numbers scattered throughout (62000, 63999, 64000) with no validation against model-specific limits.Root Cause
Solution
Created a comprehensive model-specific configuration system:
Changes
apps/backend/model_limits.json: New configuration file with all Claude 4.5 model limits
apps/backend/phase_config.py: Load limits from config, add validation
get_model_max_output_tokens(): Get model's max_tokens limitget_model_max_thinking_tokens(): Get safe thinking budgetvalidate_thinking_budget(): Caps budgets to model limits with warningsapps/frontend/src/shared/constants/models.ts: Add model limit constants
MODEL_OUTPUT_LIMITS: 64K for all modelsMODEL_MAX_THINKING: 60K safe limitTHINKING_BUDGET_MAPultrathink to 60Ktests/test_model_limits.py: New comprehensive tests (10 tests)
tests/test_thinking_level_validation.py: Updated ultrathink budget to 60K
Technical Details
Why 60,000 instead of 63,999?
max_tokens: 65537 > 64000errorTesting
✅ All 19 tests pass (9 existing + 10 new)
References
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.